class: center, middle, inverse, title-slide .title[ # FIN7028: Times Series Financial Econometrics ] .subtitle[ ## linear and non-linear time series models ] .author[ ### Barry Quinn ] .date[ ### 2023-03-09 ] --- layout: true <div class="my-footer"> <span> Barry Quinn CStat </span> </div> --- class: middle # Learning outcomes .salt[ - Stationary and differencing - Modelling stationary time series - ARIMA models (liner model) - Categories of algorithms in capital markets - Algorithmic ARIMA modelling - Practical example with some of the project data - Generative dynamic models at scale `Prophet Explained` ] --- class: middle # Stationarity and differencing .acid[ * The foundation of statistical inference in time series analysis is the concept of weak stationarity. ] .hand[A stationary series: * roughly horizontal * constant variance * no patterns predictable in the long-term ] --- class: middle .your-turn[ Are these financial time series stationary? .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] ] --- class: middle .pull-left[ #### Inference and stationarity * The monthly log returns of Russell 2000 index vary around zero over time. * If we divide up the data into subperiods we would expect each sample mean to be roughly zero. * Furthermore, expect the recent financial crisis (2007-2009), the log returns range is approximately [-0.2,0.2]. * Statistically, the mean and the variance are constant over time OR time invariant. * Put together these to time invariant properties characterise a weakly stationary series. ] .pull-right[ #### Weak stationarity and prediction * Weak form stationarity provides a basic framework for prediction. * For the monthly log returns of the Russell 2000 we can predict with reasonable confidence: * Future monthly returns `\(\approx0\)` and vary `\([-0.2,0.2]\)` ] --- class: middle ## Inference and nonstationarity * Consider quarterly earnings for Carnival Plc. * If the timespan is divided into subperiods the sample mean and variance for each period show increasing pattern. * Earnings are **not** weakly stationary. * There does exist models and methods for modelling such nonstationary series. --- class: middle .your-turn[ Is the VIX time series Stationary? <!-- --> ] --- class: middle ## Non-stationarity in the mean .hand[Identifying non-stationary series] .large[ * time plot. * The ACF of stationary data drops to zero relatively quickly * The ACF of non-stationary data decreases slowly. * For non-stationary data, the value of `\(r_1\)` is often large and positive. ] --- class: middle ## Example: FTSE index .panelset[ .panel[ .panel-name[time plot] <!-- --> ] .panel[ .panel-name[ACF] <!-- --> ] .panel[ .panel-name[First differencing] <!-- --> ] .panel[ .panel-name[ACF after first differencing] <!-- --> ] ] --- class: middle # Differencing .fat[ * Differencing helps to **stabilize the mean**. * The differenced series is the *change* between each observation in the original series: `\({y'_t = y_t - y_{t-1}}\)`. * The differenced series will have only `\(T-1\)` values since it is not possible to calculate a difference `\(y_1'\)` for the first observation. ] --- class: middle ## carnival earnings ending 2010 Q1 .panelset[ .panel[ .panel-name[subset using `window()`] <!-- --> ] .panel[ .panel-name[use `log()` to stablise variation] <!-- --> ] .panel[ .panel-name[**then** seasonally difference] <!-- --> ] .panel[ .panel-name[**then** first difference] .hand[Is the series stationary?] <!-- --> ] ] --- class: middle .discussion[ * Seasonally differenced series is closer to being stationary. * Remaining non-stationarity can be removed with further first difference. * If `\(y'_t = y_t - y_{t-12}\)` denotes seasonally differenced series, then twice-differenced series i * When both seasonal and first differences are applied * it makes no difference which is done first the result will be the same. * If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. * It is important that if differencing is used, the differences are interpretable. ] --- class: middle ## Interpretation of differencing .heat[ * first differences are the change between **one observation and the next**; * seasonal differences are the change between **one year to the next**. * But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. ] --- class: middle ## Unit root tests .pull-left[ >Statistical tests to determine the required order of differencing 1. Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal. 2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal. 3. Other tests available for seasonal data. ] .pull-right[ ## KPSS test ```r library(urca) summary(ur.kpss(ftse_m_ts)) ``` ``` ## ## ####################### ## # KPSS Unit Root Test # ## ####################### ## ## Test is of type: mu with 3 lags. ## ## Value of test-statistic is: 0.3983 ## ## Critical value for a significance level of: ## 10pct 5pct 2.5pct 1pct ## critical values 0.347 0.463 0.574 0.739 ``` ] --- class: middle ## Automatically selecting differences Seasonal strength `\(F_s = \max\big(0, 1-\frac{\text{Var}(R_t)}{\text{Var}(S_t+R_t)}\big)\)` If `\(F_s > 0.64\)`, do one seasonal difference. ```r carnival_eps_ts %>% log() %>% nsdiffs() ``` ``` ## [1] 1 ``` ```r carnival_eps_ts %>% log() %>% diff(lag=4) %>% ndiffs() ``` ``` ## [1] 0 ``` --- class: middle ## Non-seasonal ARIMA models #### Autoregressive models * When `\(y_t\)` has a statistically significant lag-1 autocorrelation, the lagged value `\(y_{t-1}\)` might be a useful in predicting `\(y_t\)`. * AR(1) model `$$y_{t}= c+\phi_{1}y_{t - 1} + \varepsilon_{t}$$` * where `\(\varepsilon_t\)` is white noise. * This is a simple linear regression with **lagged values** of `\(y_t\)` as predictors. * This simple model is widely used in stochastic volatility when `\(y_t\)` is replaced by its log volatility. --- class: middle ## Autoregressive models * More generally, if the `\(E(y_{t-1})\)` is determined by more than lag-1 we can generalise a AR(1) to an AR(p) model. .blockquote[ Autoregressive (AR) models: `$$y_{t}= c+\phi_{1}y_{t - 1}+\phi_{2}y_{t - 2} + \dots +\phi_{p}y_{t - p} + \varepsilon_{t},$$` where `\(\varepsilon_t\)` is white noise. This is a multiple linear regression with **lagged values** of `\(y_t\)` as predictors. ] --- class: middle # Example of an AR(1) model .panelset[ .panel[ .panel-name[Simulating an AR(1)] .pull-left[ * Simulating an `\(y_{t} =2 -0.8 y_{t - 1}+\varepsilon_{t}\)` * where `\(\varepsilon_t\sim N(0,1)\)` for `\(T=100\)`. ] .pull-right[ <!-- --> ] ] .panel[ .panel-name[Simulating an AR(2)] .pull-left[ * Simulating an `\(y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \varepsilon_t\)` * where `\(\varepsilon_t\sim N(0,1)\)` for `\(T=100\)`. ] .pull-right[ <!-- --> ] ] .panel[ .panel-name[AR(1) models explained] .blockquote[ `$$y_{t}=c + \phi_1 y_{t -1}+\varepsilon_{t}$$` * When `\(\phi_1=0\)`, `\(y_t\)` is **equivalent to White Noise** * When `\(\phi_1=1\)` and `\(c=0\)`, `\(y_t\)` is **equivalent to a Random Walk** * When `\(\phi_1=1\)` and `\(c\ne0\)`, `\(y_t\)` is **equivalent to a Random Walk with drift** * When `\(\phi_1<0\)`, `\(y_t\)` tends to **oscillate between positive and negative values**. ] ] ] --- class: middle ## Moving Average (MA) models .blockquote.midi[ Moving Average (MA) models: `$$y_{t} =c +\varepsilon_t + \theta_{1}\varepsilon_{t - 1} + \theta_{2}\varepsilon_{t - 2} +\cdots+\theta_{q}\varepsilon_{t - q},$$` - where `\(\varepsilon_t\)` is white noise. - This is a multiple regression with **past errors** as predictors. **Don't confuse this with moving average smoothing!** ] <!-- --> --- class: middle .hand-large[Putting it all together] ### ARIMA models - Autoregressive Integrated Moving Average models .blockquote[ Autoregressive Moving Average models $$ `\begin{align*} y_{t} = c+ \phi_{1}y_{t - 1} +\cdots +\phi_{p}y_{t-p} \\ \theta_{1}\varepsilon_{t - 1} + \cdots +\theta_{q}\varepsilon_{t-q} +\varepsilon_{t}. \end{align*}` $$ ] -- * Predictors include both **lagged values of `\(y_t\)` and lagged errors.** * Conditions on coefficients ensure stationarity. * Conditions on coefficients ensure invertibility. * Combine ARMA model with **differencing**. --- class: middle ## ARIMA models notation .pull-left[ >Autoregressive Integrated Moving Average models - ARIMA(p, d, q) model - AR part: p = order of the autoregressive part - I part: d = degree of first differencing involved - MA part: q = order of the moving average part. - ARIMA(1,1,1) model: `$$(y_t-y_{t-1}) =c + \phi_1 (y_{t-1}- y_{t-2}) +\theta_1\varepsilon_{t-1} + \varepsilon_t$$` ] .pull-right[ * White noise model: ARIMA(0,0,0) * Random walk: ARIMA(0,1,0) with no constant * Random walk with drift: ARIMA(0,1,0) with `constant term` * AR(p): ARIMA(p,0,0) * MA(q): ARIMA(0,0,q) ] --- class: middle ## ARIMA modelling of US consumption .panelset[ .panel[ .panel-name[The data] <!-- --> ] .panel[ .panel-name[fit an ARIMA(2,0,2)] ```r ((fit <- arima(uschange[,"Consumption"],order = c(2,0,2)))) ``` ``` ## ## Call: ## arima(x = uschange[, "Consumption"], order = c(2, 0, 2)) ## ## Coefficients: ## ar1 ar2 ma1 ma2 intercept ## 1.3908 -0.5813 -1.1800 0.5584 0.7463 ## s.e. 0.2553 0.2078 0.2381 0.1403 0.0845 ## ## sigma^2 estimated as 0.3417: log likelihood = -165.14, aic = 342.28 ``` ] .panel[ .panel-name[Model estimates in math] - `\(y_t = c + 1.391y_{t-1} -0.581y_{t-2}-1.18 \varepsilon_{t-1}+ 0.558\varepsilon_{t-2}+ \varepsilon_{t}\)` - where `\(c= 0.142\)` - and `\(\varepsilon_t\)` is white noise with a standard deviation of `\(0.585 = \sqrt{0.342}\)`. ] .panel[ .panel-name[Forecasts] ```r fit %>% forecast(h=10) %>% autoplot(include=80) ``` <!-- --> ] ] --- class: middle ## Understanding ARIMA models .large[ * If `\(c=0\)` and `\(d=0\)`, the long-term forecasts will go to zero. * If `\(c=0\)` and `\(d=1\)`, the long-term forecasts will go to a non-zero constant. * If `\(c=0\)` and `\(d=2\)`, the long-term forecasts will follow a straight line. * If `\(c\ne0\)` and `\(d=0\)`, the long-term forecasts will go to the mean of the data. * If `\(c\ne0\)` and `\(d=1\)`, the long-term forecasts will follow a straight line. * If `\(c\ne0\)` and `\(d=2\)`, the long-term forecasts will follow a quadratic trend. ] --- class: middle ## Understanding ARIMA models ### Forecast variance and `\(d\)` * The higher the value of `\(d\)`, the more rapidly the prediction intervals increase in size. * For `\(d=0\)`, the long-term forecast standard deviation will go to the standard deviation of the historical data. ### Cyclic behaviour * For cyclic forecasts, `\(p\ge2\)` and some restrictions on coefficients are required. * If `\(p=2\)`, we need `\(\phi_1^2+4\phi_2<0\)`. Then average length of stochastic cycles is `$$(2\pi)/\left[\text{arc cos}(-\phi_1(1-\phi_2)/(4\phi_2))\right].$$` * This formula has important uses in estimation business and economic cycles. (See Example 2.3 in Tsay (2010)) --- class: middle # Estimation and order selection ## Maximum likelihood estimation - Having identified the model order, we need to estimate the parameters `\(c,\phi_1,\dots,\phi_p \text{ }\theta_1,\dots,\theta_q\)`. * MLE is very similar to least squares estimation obtained by minimizing `\(\sum_{t-1}^T e_t^2\)` * The `Arima()` command allows CLS or MLE estimation. * Non-linear optimization must be used in either case. * Different software will give different estimates. --- class: middle ## Partial autocorrelations .blockquote[ - Partial autocorrelations} measure relationship between `\(y_{t}\)` and `\(y_{t - k}\)`, when the effects of other time lags `\(1,2, 3, \dots, k - 1\)`are removed. - `\(\alpha_k\)`= `\(k\)`th partial autocorrelation coefficient - `\(\alpha_k\)`{equal to the estimate of `\(b_k\)` in regression: `$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_k y_{t-k}$$` * Varying number of terms on RHS gives `\(\alpha_k\)` for different values of `\(k\)`. * There are more efficient ways of calculating `\(\alpha_k\)`. * `\(\alpha_1=\rho_1\)` * same critical values of `\(\pm 1.96/\sqrt{T}\)` as for ACF. ] --- class: middle ## Example: US consumption <!-- --> --- class: middle ## ACF and PACF interpretation **AR(1)** `$$rho_k =\phi_1^k \text{ for k=1,2,}\dots$$` `$$\alpha_1= \phi_1 \alpha_k = 0\text{for k=2,3}\dots$$` So we have an AR(1) model when * autocorrelations exponentially decay * there is a single significant partial autocorrelation. --- class: middle ## ACF and PACF interpretation .three-column[ **AR(p)** * ACF dies out in an exponential or damped sine-wave manner * PACF has all zero spikes beyond the `\(p\)`th spike So we have an AR(p)) model when * the ACF is exponentially decaying or sinusoidal * there is a significant spike at lag `\(p\)` in PACF, but none beyond `\(p\)` ] .three-column[ **MA(1)** $$ `\begin{align*} \rho_1 &= \theta_1 \qquad \rho_k = 0\qquad\text{for k=2,3,...};\\ \alpha_k &= -(-\theta_1)^k \end{align*}` $$ So we have an MA(1) model when * the PACF is exponentially decaying and * there is a single significant spike in ACF ] .three-column[ **MA(q)** * PACF dies out in an exponential or damped sine-wave manner * ACF has all zero spikes beyond the `\(q\)`th spike So we have an MA(q) model when * the PACF is exponentially decaying or sinusoidal * there is a significant spike at lag `\(q\)` in ACF, but none beyond `\(q\)` ] --- class: middle ## Information criteria .blockquote[ **Akaike's Information Criterion (AIC):** `\(\text{AIC} = -2 \log(L) + 2(p+q+k+1),\)` where `\(L\)` is the likelihood of the data, `\(k=1\)` if `\(c\ne0\)` and `\(k=0\)` if `\(c=0\)`] .blockquote[ **Corrected AIC:** <br> `\(\text{AICc} = \text{AIC} + \frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2}.\)` ] .blockquote[ **Bayesian Information Criterion:**<br> `\(\text{BIC} = \text{AIC} + [\log(T)-2](p+q+k-1).\)`<br> Good models are obtained by minimizing either the AIC, AICc or BIC. My preference is to use the AICc.] --- class: middle # Powerful non-stationary model in finance - In financial time series an important class on non-stationary times series model is the random walk model - A random walk can be define as `\(y_t=y_{t-1}+ error_t\)` or its drift variation `\(y_t= constant + y_{t-1}+ error_t\)` .panelset[ .panel[ .panel-name[Simulation 1] .pull-left-narrow[ `\(y_t = 10 + 0.99y_{t-1}+ \varepsilon_t\)` ] .pull-right-wide[ <!-- --> ] ] .panel[ .panel-name[Simulation 1] .pull-left-narrow[ ] .pull-right-wide[ <!-- --> ] ] ] --- class: middle # Predictive Algorithms in Capital Market - Three categories 1. Computational Statistics 2. Machine Learning Algorithms 3. Complex Systems --- class: middle ## Computational Statistics - These models refers to computationally intensive statistical methods - Resampling methods (e.g., Bootstrap and Cross-Validation), - Monte Carlo methods, - Kernel Density estimation and other Semi and Non-Parametric methods - Generalized Additive Models (Efron and Hastie, 2016). - Regularisation Methods .footnote[ Bradley, E. & Trevor, H. Computer Age Statistical Inference. (Cambridge University Press, 2016); ] --- class: middle # Resampling methods .hand[A variety of methods for doing one of the following:] 1. estimating the precision of sample statistics using subsets of data (e.g. jackknifing) or drawn randomly from a set of data points (e.g. bootstrapping) 2. Exchanging labels on data points when performing significance tests (e.g. permutation tests); 3. Validating models by using random subsets (e.g. repeated cross-validation, for example the `tsCV()` from previous lectures. --- class: middle # Monte carlo methods .hand[A broad class of computational algorithms that rely on repeated random sampling to approximate integrals.] - Particularly used to compute expected values (e.g. options payoff) - Including those meant for inference and estimation (e.g., Bayesian estimation `stan_glm()`, `prophet()` , simulated method of moments) --- class: middle ## Kernel Density Estimation .hand[A set of methods used to approximate multivariate density functions from a set of datapoints; it is largely applied to generate smooth functions, reduce outliers effects and improve joint density estimations, sampling, and to derive non-linear fits.] ## Generalised Additive Models .hand[a large class of nonlinear models widely used for inference and predictive modelling (e.g. time series forecasting, curve-fitting, `prophet()`)] ## Regularisation Methods .hand[Regularisation methods are increasingly used as an alternative to traditional hypothesis testing and criteria-based methods, for allowing better quality forecasts with a large number of features.] --- class: middle ##AI and Machine Learning .hand[This AI continuum of epistemological models spans three main communities] - Knowledge-based or heuristic algorithms (e.g. rule-based) - where knowledge is explicitly represented as ontologies or IFTHEN rules rather than implicitly via code (Giarratano and Riley, 1998) - Evolutionary or metaheuristics algorithms - a family of algorithms for global optimization inspired by biological evolution, using population-based trial and error problem solvers with a metaheuristic or stochastic optimization character (e.g. Genetic Algorithms, Genetic Programming, etc.) (Poli et al., 2008; Brownlee, 2011) - Machine Learning algorithms - a type of AI program with the ability to learn without explicit programming, and can change when exposed to new data; mainly comprising Supervised (e.g. Support Vector Machines, Random Forest, etc.), Unsupervised (e.g. K-Means, Independent Component Analysis, etc.), and Reinforcement Learning (e.g. Q-Learning, Temporal Differences, Gradient Policy Search, etc.) (Hastie et al., 2009; Sutton and Barto, 2018). .footnote[Russell and Norvig (2016) provide an in-depth view of different aspects of AI.] --- class: middle # Complex Systems .hand[A complex system is any system featuring a large number of interacting components (e.g. agents, processes, etc.) whose aggregate activity is nonlinear (not derivable from the summations of the activity of individual components) and typically exhibit hierarchical self-organization under selective pressures (Taylor, 2014; Barabási, 2016).] --- class: middle # A rule-based ARIMA algorithm .blockquote[`auto.arima()`] - For a non-seasonal ARIMA process we first need to select appropriate orders: `\(p,q,d\)` - We use the [Hyndman and Khandakar (JSS, 2008)](https://www.jstatsoft.org/article/view/v027i03) algorithm: .blockquote[ * Select no. differences `\(d\)` and `\(D\)` via KPSS test and seasonal strength measure. * Select `\(p,q\)` by minimising AICc. * Use stepwise search to traverse model space. ] --- class: middle ## How does auto.arima() work? .blockquote[ `$$\text{AICc} = -2 log(L) + 2(p+q+k+1)\left[1 + \frac{(p+q+k+2)}{T-p-q-k-2}\right].$$` - where `\(L\)` is the maximised likelihood fitted to the differenced data, - `\(k=1\)` if `\(c\neq 0\)` and `\(k=0\)` otherwise. ] --- class: middle # Algorithmic rule 1: - Select current model (with smallest AICc) from: - ARIMA$(2,d,2)$ - ARIMA$(0,d,0)$ - ARIMA$(1,d,0)$ - ARIMA$(0,d,1)$ --- class: middle # Algorithmic rule 2: - Consider variations of current model: * vary one of `\(p,q,\)` from current model by `\(\pm1\)`; * `\(p,q\)` both vary from current model by `\(\pm1\)`; * Include/exclude `\(c\)` from current model. .heatinline[ Model with lowest AICc becomes current model. ] .large[Repeat rule 2 until no lower AICc can be found] --- class:middle ## Example: VIX index .panelset[ .panel[.panel-name[Human choice] .pull-left[ ```r tidyquant::tq_get('^VIX') %>% select(adjusted) %>% ts() -> vix_ts ggtsdisplay(log(vix_ts)) ``` ] .pull-right[ <!-- --> ] ] .panel[.panel-name[Human choice] .pull-left[ ```r ggtsdisplay(diff(log(vix_ts))) ``` ] .pull-right[ <!-- --> ] ] .panel[.panel-name[Human choice] .pull-left[ ```r (fit <- Arima(log(vix_ts),order=c(7,1,0))) ``` ] .pull-right[ ``` ## Series: log(vix_ts) ## ARIMA(7,1,0) ## ## Coefficients: ## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ## -0.0860 -0.0460 -0.0161 -0.0602 -0.0161 -0.0508 -0.0384 ## s.e. 0.0197 0.0198 0.0198 0.0198 0.0198 0.0198 0.0197 ## ## sigma^2 = 0.006258: log likelihood = 2867.76 ## AIC=-5719.52 AICc=-5719.46 BIC=-5672.73 ``` ] ] .panel[.panel-name[Algorithmic choice] .pull-left[ ```r auto.arima(log(vix_ts)) ``` ] .pull-right[ ``` ## Series: log(vix_ts) ## ARIMA(1,1,2) ## ## Coefficients: ## ar1 ma1 ma2 ## 0.9482 -1.0468 0.0567 ## s.e. 0.0136 0.0245 0.0217 ## ## sigma^2 = 0.0062: log likelihood = 2877.46 ## AIC=-5746.91 AICc=-5746.9 BIC=-5723.52 ``` ] ] .panel[.panel-name[Try harder algorithm] .pull-left[ ```r (fit.auto<-auto.arima(log(vix_ts), stepwise=FALSE, approximation=FALSE)) ``` ] .pull-right[ ``` ## Series: log(vix_ts) ## ARIMA(1,1,2) ## ## Coefficients: ## ar1 ma1 ma2 ## 0.9482 -1.0468 0.0567 ## s.e. 0.0136 0.0245 0.0217 ## ## sigma^2 = 0.0062: log likelihood = 2877.46 ## AIC=-5746.91 AICc=-5746.9 BIC=-5723.52 ``` ] ] .panel[.panel-name[Discussion] .discussion[ * Setting both `stepwise` and `approximation` arguments to `FALSE` will slow the automation down but provides a more exhaustive search for the appropriate model. * The `auto.arima` function then searches over all possible models using MLE. * See `help(auto.arima)` for more details. ] ] ] --- class: middle ## Human Vs Algo: Residual Diagnostics .panelset[ .panel[.panel-name[Human choice] .pull-left[ ```r checkresiduals(fit, test=FALSE) ``` ] .pull-right[ <!-- --> ] ] .panel[.panel-name[Human forecasting] .pull-left[ ```r fit %>% forecast(h=252) %>% autoplot ``` ] .pull-right[ <!-- --> ] ] .panel[.panel-name[Algorithm] .pull-left[ ```r checkresiduals(fit.auto,test = FALSE) ``` ] .pull-right[ <!-- --> ] ] .panel[.panel-name[Algo forecasting] .pull-right[ ```r fit.auto %>% forecast(h=252) %>% autoplot() ``` ] .pull-right[ <!-- --> ] ] ] --- class: middle ## Modelling procedure with `Arima` .blockquote[ 1. Plot the data. Identify any unusual observations. 2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance. 3. If the data are non-stationary: take first differences of the data until the data are stationary. 4. Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate? 5. Try your chosen model(s), and use the AICc to search for a better model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model. 7. Once the residuals look like white noise, calculate forecasts. ] --- class: middle ## Modelling procedure with `auto.arima` 1. Plot the data. Identify any unusual observations. 2. If necessary, transform the data (using logs) to stabilize the variance. 3. Use `auto.arima` to select a model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model. 7. Once the residuals look like white noise, calculate forecasts. --- class: middle # Project based example .hand[Russell 2000 index monthly returns] .pull-left[ ```r tsfe::indices %>% select(date,`RUSSELL 2000 - PRICE INDEX`) %>% rename(r2000=`RUSSELL 2000 - PRICE INDEX`) %>% drop_na() %>% tq_transmute(select =r2000,mutate_fun = periodReturn,type='log') ->monthly_r2002r ts(monthly_r2002r$monthly.returns, start = c(1988,1))->r2000r_m_ts autoplot(r2000r_m_ts) + xlab("Year") + ylab("monthly log returns") ``` <!-- --> ] .pull-right[ 1. Time plot shows sudden changes, particularly big movements in 2007/2008 due to financial crisis. Otherwise nothing unusual and no need for data adjustments. 2. Little evidence of changing variance, so no log transformation needed. 3. Data are clearly stationary, so no *differencing* required. ] --- class: middle # Project based example .hand[Russell 2000 index monthly returns] .pull-left[ ```r ggtsdisplay(r2000r_m_ts) ``` <!-- --> ] .pull-right[ 4. PACF is suggestive of AR(5). So initial candidate model is ARIMA(5,0,0). No other obvious candidates. 5. Fit ARIMA(5,0,0) model along with variations: ARIMA(4,0,0), ARIMA(3,0,0), ARIMA(4,0,1), etc. ARIMA(3,0,1) has smallest *AICc* value. ] --- class: middle # Project based example .panelset[ .panel[.panel-name[Manual fit] .pull-left[ ```r (fit <- Arima(r2000r_m_ts, order=c(3,0,1))) ``` ] .pull-right[ ``` ## Series: r2000r_m_ts ## ARIMA(3,0,1) with non-zero mean ## ## Coefficients: ## ar1 ar2 ar3 ma1 mean ## 1.0466 -0.0945 -0.0029 -1.0000 0.0062 ## s.e. 0.0509 0.0736 0.0511 0.0094 0.0004 ## ## sigma^2 = 0.002795: log likelihood = 586.84 ## AIC=-1161.68 AICc=-1161.46 BIC=-1137.96 ``` ] ] .panel[.panel-name[Residual check] 6. ACF plot of residuals from ARIMA(3,0,1) model look like white noise. ```r checkresiduals(fit, plot=FALSE) ``` ``` ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,0,1) with non-zero mean ## Q* = 7.5056, df = 6, p-value = 0.2766 ## ## Model df: 4. Total lags used: 10 ``` ] .panel[.panel-name[Forecasting] ```r fit %>% forecast(h=24) %>% autoplot ``` <!-- --> ] ] --- class: middle ## Understanding the ARIMA output ``` ## Series: r2000r_m_ts ## ARIMA(3,0,1) with non-zero mean ## ## Coefficients: ## ar1 ar2 ar3 ma1 mean ## 1.0466 -0.0945 -0.0029 -1.0000 0.0062 ## s.e. 0.0509 0.0736 0.0511 0.0094 0.0004 ## ## sigma^2 = 0.002795: log likelihood = 586.84 ## AIC=-1161.68 AICc=-1161.46 BIC=-1137.96 ``` --- class: middle ## Understanding the ARIMA output * The fitted model is: `$$y_t=1.047y_{t-1}-0.094y_{t-2}-0.003y_{t-3} +-1\varepsilon_{t-1}+0.008$$` * The standard errors are 0.13, 0.06, 0.05, 0.12 and 0.002, respectively. * This suggest that only the AR1 and the constant (mean) are more than 2 SEs away from zero and thus statistically significant. * The significance of `\(\phi_0\)` of this entertained model implies that the expected mean return of the series is positive. * In fact `\(\hat{\mu}=0.006/(1-(1.047-0.094-0.003)) =0.12\)` which is small but has long term implications. * Using the multi-period return definition from the financial data lecture an annualised log return is simple `\(\sum_1^{12} y_t\)` `\(\approx 1.44\)` per annum. --- class: middle ## Frequentist prediction intervals `$$\hat{y}_{T+h|T} \pm 1.96\sqrt{v_{T+h|T}}$$` where `\(v_{T+h|T}\)` is estimated forecast variance. -- * `\(v_{T+1|T}=\hat{\sigma}^2\)` for all ARIMA models regardless of parameters and orders. * Multi-step prediction intervals for ARIMA(0,0,$q$): `$$\displaystyle y_t = \varepsilon_t + \sum_{i=1}^q \theta_i \varepsilon_{t-i}$$` `$$\displaystyle v_{T|T+h} = \hat{\sigma}^2 \left[ 1 + \sum_{i=1}^{h-1} \theta_i^2\right], \qquad\text{for~} h=2,3,\dots.$$` * AR(1): Rewrite as MA($\infty$) and use above result. * Other models beyond scope of this subject. --- class: middle ## Prediction intervals * Prediction intervals **increase in size with forecast horizon**. * Prediction intervals can be difficult to calculate by hand * Calculations assume residuals are **uncorrelated** and **normally distributed**. * Prediction intervals tend to be too narrow. * the uncertainty in the parameter estimates has not been accounted for. * the ARIMA model assumes historical patterns will not change during the forecast period. * the ARIMA model assumes uncorrelated future `\(errors\)` --- class: middle .huge-text[Prophet algorithm] --- class: middle # What is Prophet? >Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. .acid[ - It works best with time series that have strong seasonal effects and several seasons of historical data. - Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. - It uses a **computational statistics** algorithm know as **generalised additive models** ] --- class: middle # How does it work? .pull-left[  ] .pull-right[ - The procedure makes use of a decomposable time series model with three main model components: .heatinline[trend, seasonality, and holidays.] - Similar GAM, with time as a predictor, Prophet fits several linear and non-linear functions of time as components. In its simplest form; ] --- class: middle .hand-large[The maths] .blockquote[ `$$y(t) = g(t) + s(t) + h(t) + e(t)$$` - where `\(g(t)\)` trend models non-periodic changes (i.e. growth over time) - `\(s(t)\)` seasonality presents periodic changes (i.e. weekly, monthly, yearly) - `\(h(t)\)` ties in effects of holidays (on potentially irregular schedules ≥ 1 day(s)) - e(t) covers idiosyncratic changes not accommodated by the model ] --- class: middle In other words, the procedure’s equation can be written; >Modeling seasonality as an additive component is the same approach taken by exponential smoothing… GAM formulation has the advantage that it decomposes easily and accommodates new components as necessary, for instance when a new source of seasonality is identified. - Prophet is essentially “framing the forecasting problem as a curve-fitting exercise” rather than looking explicitly at the time based dependence of each observation. --- class: middle .acid[Trend] - The procedure provides two possible trend models for g(t), “a saturating growth model, and a piecewise linear model.” ## Saturating Growth Model - If the data suggests promise of saturation, growth is viral-like such as bitcoin prices, then setting growth='logistic' is the move. - Typical modeling of these *nonlinear, saturating trends* is basically accomplished; .blockquote[ $$ g(t)=\frac{C}{1+exp(-k(t-m))}$$ where: C is the carrying capacity k is the growth rate m is an offset parameter - There are two primary aspects of growth at Facebook (**fluctuating carrying capacity** and **volatile rate of change**) that are not captured in this simplified equation, though. ] --- class: middle .acid[Trend] ### Carrying Capacity versus Time - First, as with many scalable business models carrying capacity is not constant — as the number of people in the world who have access to the Internet increases, so does the growth ceiling. - Accounting for this is done by replacing the fixed capacity C with a time-varying capacity `\(C(t)\)`. --- class: middle .acid[Trend] ### Rate of Change versus Time - Second, the market does not allow for stagnant technology. - Advances like those seen over the past decade in handheld devices, app development, and global connectivity, virtually ensure that growth rate is not constant. - Because this rate can quickly compound due to new products, the model must be able to incorporate a varying rate in order to fit historical data. --- class: middle .acid[Trend] # Trend changes growth models - In financial time series structural breaks a common > Roughly speaking a structural break change point is where some type of random shock changes the statistical properties of the time series, for example a permanent change in the mean value of the times series. - Prophet incorporates **trend changes** in the growth model by explicitly defining changepoints where the growth rate is allowed to change. --- class: middle .acid[Trend] - Suppose there are S changepoints at times `\(s_j, j = 1,…,S\)`. - Prophet defines a vector of rate adjustments; `$$\delta \in \R ^{S}$$` - where: `\(\delta_j\)` is the change in rate that occurs at time `\(s_j\)` - The rate at any time t is then the base rate `\(k\)`, plus adjustments up to that time; `$$k + \sum_{j:t >\delta_j}\delta_j$$` - This is represented more cleanly by defining a vector `$$a(t) \in \{0,1\}^S$$` - such that `$$a_j(t)= \begin{cases}1,\text{if }t\ge s_j, \\ 0,\text{otherwise} \end{cases}$$` --- class: middle .acid[Trend] >The rate at time t is then k+*a*(t)ᵀ*δ*. When the rate k is adjusted, the offset parameter m must also be adjusted to connect the endpoints of the segments. The correct adjustment at changepoint j is easily computed as- Taylor et al., (2017) `$$\gamma_j=\left(s_j-m-\sum_{l<j}\gamma l \right) \left(1-\frac{k+\sum_{l<j}\delta_l}{k+\sum_{l \le j} \delta_l} \right)$$` - At last, the piecewise growth=‘logistic’ model is reached; `$$g(t)=\frac{C(t)}{1+exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))}$$` --- class: middle .acid[Trend] > An important set of parameters in our model is C(t), or the expected capacities of the system at any point in time. Analysts often have insight into market sizes and can set these accordingly. There may also be external data sources that can provide carrying capacities,such as population forecasts from the World Bank.-Taylor et al., (2017) - In application, the logistic growth model presented here is a special case of generalized logistic growth curves — which is only a single type of sigmoid curve — allowing the relatively straightforward extension(s) of this trend model to other families of curves. --- class: middle .acid[Trend] ### Linear Trend with Changepoints - The second — much simpler and default — trend model is a simple Piecewise Linear Model with a constant rate of growth. - It is best suited for problems without a market cap or other max in sight, and is set via growth='linear'. - For forecasting problems that do not exhibit saturating growth, a piece-wise constant rate of growth provides a parsimonious and often useful model. - Modeling the linear trend is easily realized with Prophet. .blockquote[ `$$g(t)=(k+a(t)^T\delta)t+(m+a(t)^T\gamma)$$` where: k is the growth rate δ has the rate adjustments m is the offset parameter and, to make the function continuous, γ_j is set to: ] --- class: middle ## Automatic Changepoint Selection - If known, the changepoints `\(s_j\)` can be specified by the user as dates of known events such as the start of recessions and other growth-altering events, - Or by default, changepoints may be automatically selected given a set of candidates. >Automatic selection can be done quite naturally with the formulation in either model by putting a sparse prior on `\(\delta\)` - Often, it is advisable to specify a large number of changepoints (e.g. one per month for a several year history) and use the prior: `$$\delta_j \sim Laplace(0,\tau)$$` - where `\(\tau\)` directly controls the flexibility of the model in altering its rate .acidinline[Critical note: a sparse prior on the adjustments δ has no impact on the primary growth rate k, so as τ progresses to 0 the fit reduces to standard (not-piecewise) logistic or linear growth.] --- class: middle ## Trend Forecast Uncertainty - When the model is extrapolated past the history to make a forecast, the trend g(t) will have a constant rate; the uncertainty in the forecast trend is estimated by extending the generative model forward. - The generative model for the trend is that there are; - S changepoints - over a history of T points - each of which has a rate change `\(\delta_j \sim Laplace(0,\tau)\)` - Simulation of future rate changes (that emulate those of the past) is achieved by replacing τ with a variance inferred from data. --- class: middle ## Trend Forecast Uncertainty .hand[going Bayesian] .blockquote.midi[ - In a fully Bayesian framework this could be done with a hierarchical prior on τ to obtain its posterior, otherwise we can use the maximum likelihood estimate of the rate scale parameter: `$$\lambda=\frac{1}{S}\sum_{j=1}^S |\delta_j|$$` - Future changepoints are randomly sampled in such a way that the average frequency of changepoints matches that in the history: `$$\forall j>T, \begin{cases} \delta_j=0 \text{ w.p. } \frac{T-S}{T}, \\ \delta_j \sim \text{Laplace} (0,\lambda) \text{ w.p. } \frac{S}{T} \end{cases}$$` - Thus, uncertainty in the forecast trend is measured by assuming the future will see the same average frequency and magnitude of rate changes that were seen in the history. - Once λ has been inferred from the data, this generative model is deployed to “simulate possible future trends and use the simulated trends to compute uncertainty intervals.” ] --- class: middle ## Advantage of modeling uncertainty - .large[Prophet’s assumption that the trend will continue to change with the same frequency and magnitude as it has in the history is fairly strong, so don’t bank on the uncertainty intervals having exact coverage.] - As `\(\tau\)` is increased the model has more flexibility in fitting the history and so training error will drop. - Even so, when projected forward this flexibility is prone to produce wide intervals. - .heatinline[The uncertainty intervals are, however, a useful indication of the level of uncertainty, and especially an indicator of over fitting.] --- class: middle .salt[Seasonality] - The seasonal component s(t) provides a adaptability to the model by allowing periodic changes based on sub-daily, daily, weekly and yearly seasonality. >Business time series often have multi-period seasonality as a result of the human behaviors they represent. For instance, a 5-day work week can produce effects on a time series that repeat each week, while vacation schedules and school breaks can produce effects that repeat each year. To fit and forecast these effects we must specify seasonality models that are periodic functions of [time] t. - Taylor et al., (2017) - Prophet relies on [Fourier series](https://otexts.com/fpp2/dhr.html) to provide a malleable model of periodic effects. - P is the regular period the time series will have (e.g. P = 365.25 for yearly data or P = 7 for weekly data, when time is scaled in days). --- class: middle .salt[Seasonality: the maths] - Approximate arbitrary smooth seasonal effects is therefore tied in with a standard Fourier series; `$$s(t)= \sum_{n=1}^N \left(a_n cos\left(\frac{2 \pi nt}{P}\right) + b_n sin\left(\frac{2 \pi nt}{P} \right) \right)$$` >Fitting seasonality requires estimating the 2N parameters `\(\beta = \left[a_1,b_1,\dots,a_N,b_N \right]^T\)`. This is done by constructing a matrix of seasonality vectors for each value of t in our historical and future data, for example with yearly seasonality and N= 10: `$$X(t)=\left[cos \left(\frac{2 \pi (1)t}{365.25}\right),\dots,sin\left(\frac{2 \pi (10)t}{365.25}\right) \right]$$` - Meaning the seasonal component is; `$$s(t)=X(t)\beta$$` --- class: middle .salt[Seasonality: going Bayesian] - In the *generative* model, Prophet takes `\(\beta \sim Normal(0,\sigma^2)\)` to impose a smoothing prior on the seasonality. - Truncating the series at N applies a low-pass filter to the seasonality, so, albeit with increased risk of overfitting, increasing N allows for fitting seasonal patterns that change more quickly. - For yearly and weekly seasonality we have found N = 10 and N = 3 respectively to work well for most problems. - The choice of these parameters could be automated using a model selection procedure such as AIC. --- class: middle .heat[Holidays and Events] - Impact of a particular holiday on the time series is often similar year after year, making it an important incorporation into the forecast. - The component `\(h(t)\)` speaks for predictable events of the year including those on irregular schedules (e.g. Easter). - To utilize this feature, the user needs to provide a custom list of events. - One simple way of including this list of holidays into the model is made straightforward by assuming that the effects of holidays are independent. >It is often important to include effects for a window of days around a particular holiday, such as the weekend of Thanksgiving. To account for that we include additional parameters for the days surrounding the holiday, essentially treating each of the days in the window around the holiday as a holiday itself. - Taylor et al., (2017) --- class: middle # Some further reading - [Chapter 8 ARIMA models](https://otexts.com/fpp2/arima.html) - [Chapter 9 Dynamic regression models](https://otexts.com/fpp2/dynamic.html) - Taylor SJ, Letham B. 2017. Forecasting at scale. PeerJ Preprints 5:e3190v2 https://doi.org/10.7287/peerj.preprints.3190v2 - [Ainsworth, R. (2020). Introduction to Using GitHub (Version 1.0.0) Computer software https://doi.org/10.5281/zenodo.3932346](https://rainsworth.github.io/intro-to-github/) a fantastic resource for learning about the power of git, github and most importantly how to collaborate.